Cognate and Misspelling Features for Natural Language Identification

نویسندگان

Garrett Nicolai

Bradley Hauer

Mohammad Salameh

Lei Yao

Grzegorz Kondrak

چکیده

We apply Support Vector Machines to differentiate between 11 native languages in the 2013 Native Language Identification Shared Task. We expand a set of common language identification features to include cognate interference and spelling mistakes. Our best results are obtained with a classifier which includes both the cognate and the misspelling features, as well as word unigrams, word bigrams, character bigrams, and syntax production rules.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic cognate identification with gap-weighted string subsequences

In this paper, we describe the problem of cognate identification in NLP. We introduce the idea of gap-weighted subsequences for discriminating cognates from non-cognates. We also propose a scheme to integrate phonetic features into the feature vectors for cognate identification. We show that subsequence based features perform better than state-ofthe-art classifier for the purpose of cognate ide...

متن کامل

Effect of Cognate-Based Instruction Strategy on Vocabulary Learning Among Iranian EFL Learners

Cognates are the words celebrating their similarities from phonetic, orthographic, and semantic points of view across two or more languages. The aim of the present study was to investigate the effect of cognate-based instruction strategy on vocabulary learning among Iranian EFL learners. To achieve the goal of the study, 80 EFL learners (15-27 years old) took part in the study; all of them were...

متن کامل

Effect of Cognate-Based Instruction Strategy on Vocabulary Learning Among Iranian EFL Learners

متن کامل

Siamese convolutional networks based on phonetic features for cognate identification

In this paper, we explore the use of convolutional networks (ConvNets) for the purpose of cognate identification. We compare our architecture with binary classifiers based on string similarity measures on different language families. Our experiments show that convolutional networks achieve competitive results across concepts and across language families at the task of cognate identification.

متن کامل

Offline Language-free Writer Identification based on Speeded-up Robust Features

This article proposes offline language-free writer identification based on speeded-up robust features (SURF), goes through training, enrollment, and identification stages. In all stages, an isotropic Box filter is first used to segment the handwritten text image into word regions (WRs). Then, the SURF descriptors (SUDs) of word region and the corresponding scales and orientations (SOs) are extr...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Cognate and Misspelling Features for Natural Language Identification

نویسندگان

چکیده

منابع مشابه

Automatic cognate identification with gap-weighted string subsequences

Effect of Cognate-Based Instruction Strategy on Vocabulary Learning Among Iranian EFL Learners

Effect of Cognate-Based Instruction Strategy on Vocabulary Learning Among Iranian EFL Learners

Siamese convolutional networks based on phonetic features for cognate identification

Offline Language-free Writer Identification based on Speeded-up Robust Features

عنوان ژورنال:

اشتراک گذاری